Digital image generation through an active lighting system

ABSTRACT

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an active lighting system. In one aspect, a method includes receiving a first image of the physical document having a first glare signature and a second image of the physical document having a second glare signature that is different from the first glare signature; determining a first glare map of the first image and a second glare map of the second image; comparing the first glare map to the second glare map; and generating the digital image based on the comparison of the first and second glare maps.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/437,869, filed Jun. 11, 2019, which claims the benefit under 35U.S.C. § 119(e) of U.S. Patent Application No. 62/683,993, entitled“Digital Image Generation Through An Active Lighting System,” filed Jun.12, 2018, which is incorporated herein by reference in its entirety.

TECHNICAL FIELD

This specification generally relates to systems and methods forcapturing documents for general analysis.

BACKGROUND

This specification describes technologies for detecting tamperedphysical documents and extracting information from documents based ondigital images. The use of physical identification documents has beenpervasive in various industries for decades. Moreover, in recent years,digital images of identification documents are increasingly being usedfor conducting secure, electronic transactions. Current techniques forauthenticating imaged identification documents involve systemsconfigured to scrutinize certain security features physically embeddedinto the underlying document. These security features are, by design,extremely difficult to replicate, and therefore effectively thwartattempts to produce counterfeit copies of the document. Many securityfeatures currently in use include intricate print patterns, digitalwatermarks, micro-printed text, unique emblems or logos, holograms, andthe like. Conventional authentication techniques for processing theseimaged identification documents is performed by systematically decodinginformation from a digital watermark and/or employing text or patternmatching techniques to verify the legitimacy of one or more othersecurity features.

SUMMARY

This specification describes technologies for an active lighting systememployed to fuse digital images for enhanced document analysis. Morespecifically, implementations are directed to techniques (e.g., methods,systems, devices, etc.) to an active lighting system for generating amerged image of a physical document based on glare maps generated frommultiple images of the physical document each having distinct glaresignatures. The merged image may be employed in detecting the digital orphysical tampering of physical documents based on one or more aspectsthat are intrinsic to a digital image, and not, for example, associatedwith extracted text (e.g., text identified by optical characterrecognition) or other encoded data (e.g., data encoded in securityfeatures or machine readable zones). Such aspects include pixel featuresthat provide evidence of physical and/or digital tampering, as well ascertain benign pixel features that include, but are not limited to:environmental, capture device, credential wear, lighting effects,hardware/software quantization, and/or digital compression effects. Insome examples, these tamper-detection techniques are applied to one ormore specific regions of interest (e.g., high value identificationregions of the physical documents).

Digital images of physical documents, as discussed in this disclosure,are digital images of the physical documents suitable for use inelectronic transactions. The term “electronic transactions” refersbroadly to any computer-facilitated exchange between a possessor of aphysical or imaged identification document and one or more thirdparties. Electronic transactions can be conducted in-person or remotelyvia computer networks. Some electronic transactions may include theexchange of currency, but others may not. Suitable physical documentsfor conducting secure, electronic transactions may include, but are notlimited to, personal identity, employment, or professional credentialsor certifications, or other high value identity-assertion documentation(e.g., a driver's license or a passport). Further, in someimplementations, suitable physical documents may include so-called“breeder documents” (e.g., birth certificates, marriage certificates,social security documents, as well as utility bills, service bills, andother vital data correlation documents). The terms “physical document”may be used throughout this disclosure when referring to any documentdesigned for identity certification, assertion, or authorization thatincludes identification data. The “identification data” may include oneor more of the following: an identification photograph, biographicalinformation (e.g., a date of birth, an identification serial number, asocial security number, a physical or electronic mailing address, aheight, eye color, and gender), and/or one or more machine readablezones (MRZs) (e.g., a barcode or a Q-code). In some implementations, theidentification data may further include other biometric information inaddition to the ID photo, such as fingerprints, hand geometry, retinapatterns, iris patterns, handwriting patterns, and/or other physicalmorphological identity characteristics. Regions of the imagedidentification document that contain this identification data arereferred to generally as “high value regions” throughout the presentdisclosure because of their importance in identifying the document'spossessor in an electronic transaction.

One or more embodiments of the present disclosure are resultant of arealization that conventional techniques for authenticating imagedphysical documents are difficult to implement, prone to failure, and/orsuffer from severe security vulnerabilities. As one example,authentication techniques reliant upon security features can bedifficult to implement on a large scale because they requiremodifications to the physical documents to insert physical securityfeatures. This amounts to a reissuance of the credential to eachpossessor. These modifications can take a long time to propagate througha large universe of physical credentials, such as passports and driver'slicenses, because users tend to replace them infrequently. Thus, forinstance, it could take years to fully implement a digital watermarkingsystem that requires coded data to be embedded in each document. Theseconventional authentication techniques can also be prone to failurebecause the decoding and/or text/pattern recognition routines requirethe identification document to be imaged in very particular lightingconditions and/or alignment orientations. It often takes the userseveral attempts to achieve a suitable image capture (e.g., images thatare free of unwanted image artifacts, such as glare, motion blur,blooming, dirty lens blur, and so forth). Capture modalities, such aspassport scanners with active UV and near-IR lighting, may also not berelevant when a mobile device, such as a smartphone, is desired as thecapture device. Moreover, while conventional security features can beeffective at inhibiting or preventing successful counterfeiting, theyare not helpful in detecting whether an authentically issued physicalidentification document has been digitally or manually tampered with.For example, the possessor of an authentic identification document maytamper with that document by replacing or altering certain high valueregions (e.g., photos, biometrics, biographics, and MRZs) that arecritical for identifying the possessor in electronic transactions.

This type of tampering can often be achieved without affecting theembedded security features (e.g., where the security features do notoverlap with the high value regions of the identification document), andthus will not be detected by conventional authentication techniques,which allows the document possessor to hide or outright replace criticalinformation in order to conceal his/her identity. Moreover, it isrelatively simple to manipulate non-security feature aspects of theidentification document, including the high value regions, usingcommercially available image editing software. Of course, attempts attampering with physical documents tend to vary in type and level ofsophistication. At the lower sophistication levels, entire regions ofthe identification document may be altered or replaced (digitally orphysical) without making any attempts to match texture or font. Otherattempts may be more refined. For example, the forger may utilizespecial software in an attempt to meticulously recreate backgrounds,security features, and the like. As yet another example, the forger mayattempt to homogenize the modified portions of the image by taking a newlive photo of a printout or screenshot of the splice or tamper. Theseand a myriad of other tamper techniques can be used to effectivelyundermine conventional authentication methods.

Accordingly, embodiments of the present disclosure aim to resolve theseand other problems with conventional authentication techniques byproviding a fundamental paradigm shift in the field that does not relysolely on security features to verify the legitimacy of imaged physicaldocuments. In a general implementation, the present disclosure relatesto a system that includes one or more processors and a computer-readablestorage device coupled to the one or more processors. Instructions arestored on the computer-readable storage device that when executed by theone or more processors, cause the one or more processors to performoperations. These operations include receiving a first image of aphysical document having a first glare signature and a second image ofthe physical document having a second glare signature that is differentfrom the first glare signature. The first image and the second image arealigned based on the physical document by 1) estimating a homographyusing features from accelerated segment test (FAST) detector and anoriented FAST and rotated Binary Robust Independent Elementary Features(ORB) detector to provide a description of texture around the physicaldocument as depicted in each image, and 2) warping each pixel in thesecond image with the first image through a bi-linear interpolation. Afirst glare map of the first image is determined by generating a firstgreyscale image of the first image. A second glare map of the secondimage is determined by generating a second greyscale image of the secondimage. The first glare map and the second glare map are dilated toexpanded regions of glare represent on each map. The first glare map iscompared to the second glare map. A digital image is generated byreplacing the regions of glare in the first image with respective mappedregions from the second image, wherein the mapped regions from thesecond image do not include glare.

In another general implementation, a computer-implemented method forproviding a digital image of a physical document includes receiving afirst image of the physical document having a first glare signature anda second image of the physical document having a second glare signaturethat is different from the first glare signature. A first glare map ofthe first image and a second glare map of the second image aredetermined. The first glare map to the second glare map are compared.The digital image is generated based on the comparison of the first andsecond glare maps.

In yet another general implementation, one or more non-transitorycomputer-readable storage media coupled to one or more processors andhaving instructions stored thereon which, when executed by the one ormore processors, cause the one or more processors to perform operations.These operations include receiving a first image of a physical documenthaving a first glare signature and a second image of the physicaldocument having a second glare signature that is different from thefirst glare signature. A first glare map of the first image and a secondglare map of the second image are determined. The first glare map to thesecond glare map are compared. A digital image is generated based on thecomparison of the first and second glare maps.

In an aspect combinable with any of the general implementations, thefirst image is taken with a flash, and wherein the second image is takenwithout a flash.

In another aspect combinable with any of the previous aspects, the firstimage and the second image are taken in succession within a thresholdtemporal distance.

In another aspect combinable with any of the previous aspects, theoperation or method includes before determining the first glare map ofthe first image and the second glare map of the second image, aligningthe first image and the second image based on the physical document.

In another aspect combinable with any of the previous aspects, thealigning of the first image and the second image includes: estimating ahomography using a FAST detector and an ORB detector to provide adescription of texture around the physical document as depicted in eachimage; and warping each pixel in the second image with the first imagethrough a bi-linear interpolation.

In another aspect combinable with any of the previous aspects, thehomography is estimated based on a random sample consensus (RANSAC)algorithm.

In another aspect combinable with any of the previous aspects, thedetermining of the first glare map of the first image includesgenerating a first greyscale image of the first image, and determining asecond glare map of the second image includes generating a secondgreyscale image of the second image.

In another aspect combinable with any of the previous aspects, the firstglare map and the second glare map are each binary images where eachpixel represents either glare or no glare.

In another aspect combinable with any of the previous aspects, theoperations or method includes before comparing the first glare map tothe second glare map, dilating the first glare map and the second glaremap to expanded regions of glare represent on each map.

In another aspect combinable with any of the previous aspects, thedigital image is generated by replacing the regions of glare in thefirst image with respective mapped regions from the second image,wherein the mapped regions from the second image do not include glare.

In another aspect combinable with any of the previous aspects, themapped regions from the second image are merged into the first image toform the digital image through Poisson imaging blending.

In another aspect combinable with any of the previous aspects, gradientinformation throughout the replaced regions of glare is employed tointerpolate a color propagated from a boundary of each replaced glareregion in the generated digital image.

In another aspect combinable with any of the previous aspects, themapped regions from the second image are merged into the first image toform the digital image through a Mean Value Coordinates (MVC) forInstant Image Cloning algorithm.

In another aspect combinable with any of the previous aspects, thedigital image is employed in an analysis of the physical document toidentify text or data elements in the physical document.

In another aspect combinable with any of the previous aspects, theanalysis of the physical document includes at least one of opticalcharacter recognition (OCR), optical word recognition (OWR), intelligentcharacter recognition (ICR), intelligent word recognition (IWR), naturallanguage processing (NLP), or machine learning.

In another aspect combinable with any of the previous aspects, thedigital image is employed in an analysis of the physical document todetect digital tampering or physical tampering.

In another aspect combinable with any of the previous aspects, thephysical document is a professional or government-issued credentials orcertifications.

It is appreciated that techniques, in accordance with the presentdisclosure, can include any combination of the aspects and featuresdescribed herein. That is, techniques in accordance with the presentdisclosure are not limited to the combinations of aspects and featuresspecifically described herein, but also may include any combination ofthe aspects and features provided.

The details of one or more implementations of the present disclosure areset forth in the accompanying drawings and the description below. Otherfeatures and advantages of the present disclosure will be apparent fromthe description and drawings, and from the claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 depicts an example environment that can be employed to executeimplementations of the present disclosure.

FIG. 2 depicts an example system for capturing two or more images of adocument for processing according to implementations of the presentdisclosure.

FIGS. 3A-3C depict example images of a physical document.

FIGS. 4A-4B depict flow diagrams of example processes employed within anactive lighting system to generate a merged image.

FIG. 5 shows an example of a computing device and a mobile computingdevice that can be employed to execute implementations of the presentdisclosure.

DETAILED DESCRIPTION

One or more embodiments of the present disclosure involve systems andmethods for an active lighting system for generating a merged image of aphysical document for conducting electronic transactions. In particular,certain embodiments may include processing received images of a physicaldocument to generate glare maps based on the distinct glare signaturesof each received image. The glare maps are compared to generate themerged image. The merged image may be employed in a broad range ofapplications including OCR, face recognition (removing glare from theface), and tamper detection. For example, the merged image may beemployed in detecting the digital or physical tampering of the physicaldocument based on one or more aspects that are intrinsic to a digitalimage, and not, for example, associated with extracted text (e.g., textidentified by optical character recognition) or other encoded data(e.g., data encoded in security features or machine readable zones).

FIG. 1 depicts an example environment 100 that can be employed toexecute implementations of the present disclosure. The exampleenvironment 100 includes mobile computing devices 104, and 106, aback-end system 112, and a network 110. In some implementations, thenetwork 110 includes a local area network (LAN), wide area network(WAN), the Internet, or a combination thereof, and connects web sites,devices (e.g., the mobile computing devices 104 and 106) and back-endsystems (e.g., the back-end system 112). In some implementations, thenetwork 110 can be accessed over a wired and/or a wirelesscommunications link. For example, mobile computing devices (e.g., thesmartphone device 104 and the tablet device 106), can use a cellularnetwork to access the network 110.

In the depicted example, the back-end system 112 includes at least oneserver system 114 and a data store 116. In some implementations,back-end system 112 provides access to one or more computer-implementedservices that users 102 can interact with using the mobile computingdevices 104 and/or 106. The computer-implemented services may be hostedon, for example, the at least one server system 114 and a data store116. The computer-implemented services may include, for example, animage merging service. In some implementations, the back-end system 112includes computer systems employing clustered computers and componentsto act as a single pool of seamless resources when accessed through thenetwork 110. For example, such implementations may be used in datacenter, cloud computing, storage area network (SAN), and networkattached storage (NAS) applications. In some implementations, back-endsystem 112 is deployed and provides computer-implemented servicesthrough a virtual machine(s).

The mobile computing devices 104 and 106 may each include anyappropriate type of computing device, such as a laptop computer, ahandheld computer, a tablet computer, a personal digital assistant(PDA), a cellular telephone, a network appliance, a camera, a smartphone, an enhanced general packet radio service (EGPRS) mobile phone, amedia player, a navigation device, an email device, a game console, oran appropriate combination of any two or more of these devices or otherdata processing devices. In the depicted example, the computing device104 is provided as a smartphone and the computing device 106 is providedas a tablet-computing device. It is contemplated, however, thatimplementations of the present disclosure can be realized with any ofthe appropriate computing devices, such as those mentioned previously.For example, the mobile computing devices 104 and 106 can also be a lessreadily portable type of computing device, such as a desktop computer,laptop computer, smart appliance, gaming console, and so forth.

Implementations of the present disclosure are described in furtherdetail herein with reference to an example context. The example contextincludes the user 102 providing an image of a document for processing.For example, the user 102 may need to provide a government-issuedidentifier or credential, such as a passport, to be validated. It iscontemplated, however, that implementations of the present disclosurecan be realized in any appropriate context. Other example contextsinclude automatic user-worn badge authentication, such as when walkingthrough a security gate; data extraction from general materials printedwith non-diffuse materials, such as polycarbonate gift cards or productpackaging with multiple non machine readable identifiers; and machinevision applications in, for example, a factory where objects roll onconveyor belts at high speeds.

FIG. 2 depicts an example system 200 for capturing two or more images ofa physical document 220 for processing according to implementations ofthe present disclosure. Device 202 is substantially similar to themobile computing devices 104 and 106 depicted in FIG. 1. The device 202includes one or more lights 208 and one or more cameras 206 that captureimage(s) and/or video data of a field of view in proximity to the device202. In some instances, the camera(s) 206 may be peripheral device(s)connected to the device 202 over a wired or wireless network, as opposedto built-in components of the device 202. The camera(s) 206 can captureimages and/or video data of a physical object, such as the physicaldocument 220. In the example system 200, the imaged object is a physicaldocument 220, such as described in detail above. The image datagenerated by the camera(s) 206 can include at least two still imagesand/or video data of the imaged object. The light(s) 208 may generate a“flash” of light when the camera(s) 206 capture an image. This flash oflight can be on any band on the electromagnetic spectrum that isrelevant to document processing in the face of glare. Examples includenot only visible light, but also infrared light and ultraviolet light.

Implementations support the use of any suitable type of device 202 thata user, such as the user 102 depicted in FIG. 1, can employ to interactwith the application 204 through the user interface (UI) 212. The UI 212displays content to the user. The application 204 may be programed toassist an untrained, non-technical user by employing the camera(s) 206to capture images 214 and to actively change the lighting betweencaptured images. For example, the application 204 may be programed tocapture images within a configurable threshold of one another and/orwith or without a flash (e.g., a flash and a no-flash image). In someimplementations, the application 204 provides the captured images to animage merging service provided by a back-end system, such as back-endsystem 112 depicted in FIG. 1. In some implementations, the application204 includes the image merging service. The image merging service,either remotely hosted or included in the application 204, employs anactive lighting system such as described in detail herein.

In some implementations, an active lighting system merges at least twoimages with distinct glare signatures (e.g., dissimilar glare regions)to form a merged or composite image. A glare signature may refer to theglare regions of a particular image. Glare or specular reflection refersto the reflected light captured in the images(s). In some examples,dissimilar glare refers glare that is not uniform between the images andthus the images have a distinct glare signature. These distinct glaresignatures provides the structure to merge together the images so thatdetails, such as the identification data elements 222, that are hiddenor occluded may be revealed in the generated merged image.

For example, an image taken with no flash may include glare region(s)where a reflection of the ambient lighting from the environment (e.g.,from a light fixture in the ceiling) off the captured physical document220 was captured in the image. These regions of glare in images of thephysical document 220 may obstruct, for example, some or all of thecharacters in the identification data elements 222. These obstructedglare regions may render the characters illegible for processing. Theactive lighting system uses this no-flash image along with an image ofthe same physical document 220, but taken in different lighting (e.g.,with a “flash” of light from the light 208) to form a merged image (seee.g., FIG. 3C) to show the obstructed details of the physical document220. Other ways of capturing images with distinct glare signatures foruse within the described active lighting system include capturing imagesusing dissimilar colors of light or with a flash in differing locations(e.g., two lights 208 positioned on the different corners of the device202). Examples of no-flash and flash images are depicted in FIGS. 3A and3B respectively.

In some implementations, the bulk of a merged image generated by theactive lighting system is taken from one of the images submitted forprocessing. This image may be referred to as the primary image. Theother image(s) may be used to fill-in in the glare regions of theprimary image. This image(s) may be referred to as the secondaryimage(s). For example, for a flash and a no-flash image of the physicaldocument 220, the flash image may serve as the primary image and theno-flash image as the secondary image (or vice versa). In such anexample, the active lighting system may use the pixels from the primaryimage, except for those in the glare region(s), for the merged image.For the pixels in the glare region(s), the active lighting systemdetects the glare region(s) on the primary image and then blends orinterpolates the color in that glare region using the secondary image asthe source of interpolation. In other implementations, the merged imageis generated by the active lighting system with each pixel as an actualmixture between the respective pixels in the primary and secondaryimages aligned according to, for example, the depicted document 220(e.g., some of the color comes from no flash and some of it comes fromflash). Example processes for generating a merged image by the activelighting system are described in FIGS. 4A and 4B.

Once generated by an active lighting system, analysis of a merged imageof the physical document 220 may be employed to, for example, determineone or more identification data elements 222 that are present in thephysical document 220. Such identification data elements 222 can includetext portions (e.g., alphanumeric text) of any suitable length, such asone or more characters, words, phrases, sentences, numbers, symbols,barcodes, and/or other text elements. The identification data elements222 can also include at least a portion of one or more graphic elementsand/or images that are printed onto the physical document 220. Thedocument analysis can include any suitable technique for OCR, OWR, ICR,IWR, NLP, machine learning, parsing, and/or other techniques foridentifying particular text elements or other data elements in thephysical document 220. The document analysis can include detectingdigital tampering or physical tampering by removing the nuisance ofglare, which could potentially cover up or oversaturate a tamper region,such as a manipulated date of birth. Other techniques for documentanalysis are discussed in U.S. Patent Publication No. 2018/0107887,which is incorporated by reference herein.

FIGS. 3A-3C each depict example images 300, 310, and 320 respectively.The example images 300, 310, and 320 depict the physical document 220and each include a respective glare signature. The example image 300includes a glare signature with glare regions 302 and is an example ofan image taken with no flash. The example image 310 includes a glaresignature with glare region 312 and is an example of an image taken witha flash. In the example image 310, where there previously was glare(e.g., glare regions 302) in the example no-flash image 300, the glareeffect has been reduced and the visibility increased in those glareregions 302. However, as a consequence, there is another glare region,glare region 312, on the flash image corresponding to the new lightsource introduced (e.g., the flash). Example image 320 is a merged imageof the images 300 and 310 resulting from the active lighting system asdescribed herein. The example merged image 320 includes a glare region322 that represents the overlap between the glare signatures of images300 and 310 (e.g., two glare regions 302 and 312 from the images). Somemerged images generated by the aligned lighting system include no glareregion 322 as there is, for example, no overlap between glare regions ofthe primary and secondary image(s) and/or there is sufficient data toremove the glare regions fully from the merged image.

FIGS. 4A-4B depict flow diagrams of example processes 400 and 420respectively. Processes 400 and 420 are employed within an activelighting system to generate a merged image of, for example, a documentfor analysis. For clarity of presentation, the description that followsgenerally describes methods 400 and 420 in the context of FIGS. 1-3C.However, it will be understood that methods 400 and 420 may each beperformed, for example, by any other suitable system, environment,software, and hardware, or a combination of systems, environments,software, and hardware as appropriate. In some implementations, varioussteps of methods 400 and 420 can be run in parallel, in combination, inloops, or in any order.

In process 400, a primary image and a secondary image are captured (402)by, for example, a camera associated with a user device. In someimplementations, to minimize the motion of the camera between imagecaptures, the images are taken in succession one after the other andwithin a threshold temporal distance. Even with a short threshold, theremay be a certain degree of camera movement between the images. Toaccount for this movement, the images are aligned. To align the images,a homography (a 3×3 matrix) is estimated (404) using at least four pointcorrespondences between the before and after movement images. The pointsare computed using a feature detector, such as the FAST algorithm, andthe descriptor of each point, used to match points between before andafter images is computed using a feature descriptor, such as the ORBalgorithm. ORB is a fast robust local feature descriptor and describesthe neighborhood of pixel intensities around each location. ORB providesa description of the texture around a two-dimensional landmark (e.g.,the physical document) in an image, which is conducive to matching otherORB descriptors in another image (e.g., matching between the primary andsecondary images). The homography may be estimated through, for example,a RANSAC algorithm, which involves random trials of estimating thehomography using four point correspondences selected at random, countingthe number of inliers per trial and choosing the homography with the maxnumber of inlier matches. The images are aligned (406) by warping pixelsbased on the homography. For example, each pixel for the imaged documentin one image is mapped to a corresponding pixel for the imaged documentin the other image (e.g., the images are aligned according to thephysical documents which are assumed to lie in a 3D plane).Specifically, point [x y 1] is mapped to the other image usingcoordinate [x′ y′ w′]′=H*[x y 1]′, where H is the homography, [x y 1]′is the column vector of coordinate (x, y) as a homogeneous coordinate (1is added as a third dimension), and [x′ y′ w′]′ is the column vector ofthe mapped homogeneous coordinate (and is converted back tonon-homogeneous coordinates using (x′/w′, y′/w′). In someimplementations, H is a 3×3 matrix which warps homogeneous points lyingin one world plane to another world plane under full perspectivedistortion. In some implementations, using the estimated homography, abi-linear interpolation is employed to warp each pixel in the secondaryimage so that it aligns exactly with the primary image.

The aligned images are segmented (408) into glare and no-glare regionsbased on the glare signature for each image. In some implementations, tosegment the images, both the primary and secondary images are convertedto grayscale. Glare and no-glare regions within each greyscale image areidentified based on a threshold. For example, a pixel with an intensitygreater than the threshold value certain intensity is categorized asglare (e.g., a pixel is a saturated or “white” pixel in the image).Based on this threshold value, a new binary image (e.g., black andwhite) is generated for each of the primary and secondary images. Thebinary image is a type of a glare map where each pixel represents eitherglare or no glare. As an example, a threshold value may be set at 253out of 255 intensities for an 8-bit grayscale image. Pixels that aregreater than the threshold are assigned as foreground in the glare map,otherwise they are assigned as background. Other techniques may beemployed by process 400 to segment the images. For example, employingmachine learning to train an algorithm according to training data todetermine what constitutes glare and no glare regions in an image. In amachine learning approach, a training set of data may be constructed andsegmented to represent the ground truth. This affords an opportunity toensure the boundary of the glare segmentation does not overlap with acritical feature, such as a digit or character of text, or such as asecurity feature.

For each of the segmented images (e.g., the glare maps), the glareregions are dilated (410) (e.g., the amount of white pixels is expandedslightly to increase the size of the glare regions on each glare map).For example, the glare regions are enlarged until some threshold is metto capture the specular reflection around the edges of the glareregions. In some implementations, a glare detector can be employed toselect the optimal pixel and/or group of pixels from each image to usefor generating the merged image. For example, a 7×7 dilation kernel onan image with a resolution of 300 dots per inch (DPI) may be used toestimate the document in the images. The flash and no glare maps may bedilated a configurable number of times in succession. The glare regionsare filtered in each glare map to keep the largest glare region. Thedifference between the flash glare map and no-flash glare map iscomputed setting a threshold for this difference to keep the positivevalues (e.g., threshold greater than or equal to 0). The largest glareregion from this difference is retained.

A blended image is generated (412) from the dilated glare maps. In someimplementations, the glare regions in the primary image (e.g., the flashimage) are replaced with the respective mapped regions of no-glare fromthe secondary image (e.g., the no-flash image). In some implementations,when there is glare regions in the same location on the depicteddocument in the primary and secondary images, the pixels from theprimary image (e.g., the flash image) are retained as the base-line forthe merged image.

The boundary between the copied pixels from the secondary image to theprimary image may be problematic for proper analysis of the document(e.g., tamper detection or OCR analysis may be hindered) when, forexample, there are hard, high-frequency contrast areas across a textelement of the imaged document. Various modes of blending the mergedpixels may be employed in this step, such as Poisson imaging blending.As an example, to employ Poisson image blending, the gradientinformation or the changes of intensity throughout the entire regionthat is to be cloned and/or replaced is copied. The copied gradients areused to interpolate or create a new color that is propagated from theboundary of the corresponding region in the merged image. Employing thistype of blending locks in the color from the primary image into thecopied/replacement region from the secondary image. In some examples,the regions are copied directly from the secondary image and insertedinto the merged image without blending. In such examples, the OCR may betrained to recognize these areas properly and accurately determine thevalidity of the document and/or merged image. For example, pixels may beselected for the merged image from whichever of the source images thathas the least of amount of glare. In another example, an acceleratedapproximation of the Poisson image blending algorithm named MVC forInstant Image Cloning may be used to, for example, increase performancewith similar quality to Poisson.

In some examples, when the type of document is known before the flashno-flash merge operation, a template for that document type, such as themost recent US passport, may be used to make content-dependent decisionsabout how to merge flash and no-flash images. The captured pair ofimages are both registered to the template using the samehomography-based warping technique used to register flash to no-flash.After registering to the template, variable regions specified in thetemplate map to the same regions in both flash and no-flash images. Ifglare is detected over a portion of a high value region of interest,such as the last name text, the glare region may be expanded so that theentire last name is replaced in the flash image with the no-flash pixelsas opposed to just the glare pixels in that region. The template enablesthe identification of high value regions and correspondingly how tosegment and blend in those regions. The document type may be provided bythe user or automatically recognized using a separate recognitionmodule.

In process 420, a first and second image of a physical document arereceived (422). The first image includes a first glare signature, andthe second image including a second glare signature that is differentfrom the first glare signature. A first glare map of the first image anda second glare map of the second image are determined (424). The firstglare map is compared (426) to the second glare map. A digital image isgenerated (426) based on the comparison of the first and second glaremaps and the process 420 end.

FIG. 5 shows an example of a computing device 500 and a mobile computingdevice 550 that employed to execute implementations of the presentdisclosure. The computing device 500 is intended to represent variousforms of digital computers, such as laptops, desktops, workstations,personal digital assistants, servers, blade servers, mainframes, andother appropriate computers. The mobile computing device 550 is intendedto represent various forms of mobile devices, such as personal digitalassistants, cellular telephones, smart-phones, and other similarcomputing devices. Additionally, computing device 500 and/or 550 caninclude Universal Serial Bus (USB) flash drives. The USB flash drivesmay store operating systems and other applications. The USB flash drivescan include input/output components, such as a wireless transmitter orUSB connector that may be inserted into a USB port of another computingdevice. The components shown here, their connections and relationships,and their functions, are meant to be examples only, and are not meant tobe limiting.

The computing device 500 includes a processor 502, a memory 504, astorage device 506, a high-speed interface 508, and a low-speedinterface 512. In some implementations, the high-speed interface 508connects to the memory 504 and multiple high-speed expansion ports 510.In some implementations, the low-speed interface 512 connects to alow-speed expansion port 514 and the storage device 506. Each of theprocessor 502, the memory 504, the storage device 506, the high-speedinterface 508, the high-speed expansion ports 510, and the low-speedinterface 512, are interconnected using various buses, and may bemounted on a common motherboard or in other manners as appropriate. Theprocessor 502 can process instructions for execution within thecomputing device 500, including instructions stored in the memory 504and/or on the storage device 506 to display graphical information for agraphical user interface (GUI) on an external input/output device, suchas a display 516 coupled to the high-speed interface 508. In otherimplementations, multiple processors and/or multiple buses may be used,as appropriate, along with multiple memories and types of memory. Inaddition, multiple computing devices may be connected, with each deviceproviding portions of the necessary operations (e.g., as a server bank,a group of blade servers, or a multi-processor system).

The memory 504 stores information within the computing device 500. Insome implementations, the memory 504 is a volatile memory unit or units.In some implementations, the memory 504 is a non-volatile memory unit orunits. The memory 504 may also be another form of computer-readablemedium, such as a magnetic or optical disk.

The storage device 506 is capable of providing mass storage for thecomputing device 500. In some implementations, the storage device 506may be or include a computer-readable medium, such as a floppy diskdevice, a hard disk device, an optical disk device, a tape device, aflash memory, or other similar solid-state memory device, or an array ofdevices, including devices in a storage area network or otherconfigurations. Instructions can be stored in an information carrier.The instructions, when executed by one or more processing devices, suchas processor 502, perform one or more methods, such as those describedabove. The instructions can also be stored by one or more storagedevices, such as computer-readable or machine-readable mediums, such asthe memory 504, the storage device 506, or memory on the processor 502.

The high-speed interface 508 manages bandwidth-intensive operations forthe computing device 500, while the low-speed interface 512 manageslower bandwidth-intensive operations. Such allocation of functions is anexample only. In some implementations, the high-speed interface 508 iscoupled to the memory 504, the display 516 (e.g., through a graphicsprocessor or accelerator), and to the high-speed expansion ports 510,which may accept various expansion cards. In the implementation, thelow-speed interface 512 is coupled to the storage device 506 and thelow-speed expansion port 514. The low-speed expansion port 514, whichmay include various communication ports (e.g., USB, Bluetooth, Ethernet,wireless Ethernet) may be coupled to one or more input/output devices.Such input/output devices may include a scanner 530, a printing device534, or a keyboard or mouse 536. The input/output devices may also bycoupled to the low-speed expansion port 514 through a network adapter.Such network input/output devices may include, for example, a switch orrouter 532.

The computing device 500 may be implemented in a number of differentforms, as shown in the FIG. 5. For example, it may be implemented as astandard server 520, or multiple times in a group of such servers. Inaddition, it may be implemented in a personal computer such as a laptopcomputer 522. It may also be implemented as part of a rack server system114. Alternatively, components from the computing device 500 may becombined with other components in a mobile device, such as a mobilecomputing device 550. Each of such devices may contain one or more ofthe computing device 500 and the mobile computing device 550, and anentire system may be made up of multiple computing devices communicatingwith each other.

The mobile computing device 550 includes a processor 552, a memory 564,an input/output device, such as a display 554, a communication interface566, and a transceiver 568, among other components. The mobile computingdevice 550 may also be provided with a storage device, such as amicro-drive or other device, to provide additional storage. Each of theprocessor 552, the memory 564, the display 554, the communicationinterface 566, and the transceiver 568, are interconnected using variousbuses, and several of the components may be mounted on a commonmotherboard or in other manners as appropriate. In some implementations,the mobile computing device 550 may include a camera device (not shown).

The processor 552 can execute instructions within the mobile computingdevice 550, including instructions stored in the memory 564. Theprocessor 552 may be implemented as a chipset of chips that includeseparate and multiple analog and digital processors. For example, theprocessor 552 may be a Complex Instruction Set Computers (CISC)processor, a Reduced Instruction Set Computer (RISC) processor, or aMinimal Instruction Set Computer (MISC) processor. The processor 552 mayprovide, for example, for coordination of the other components of themobile computing device 550, such as control of UIs, applications run bythe mobile computing device 550, and/or wireless communication by themobile computing device 550.

The processor 552 may communicate with a user through a controlinterface 558 and a display interface 556 coupled to the display 554.The display 554 may be, for example, a Thin-Film-Transistor LiquidCrystal Display (TFT) display, an Organic Light Emitting Diode (OLED)display, or other appropriate display technology. The display interface556 may comprise appropriate circuitry for driving the display 554 topresent graphical and other information to a user. The control interface558 may receive commands from a user and convert them for submission tothe processor 552. In addition, an external interface 562 may providecommunication with the processor 552, so as to enable near areacommunication of the mobile computing device 550 with other devices. Theexternal interface 562 may provide, for example, for wired communicationin some implementations, or for wireless communication in otherimplementations, and multiple interfaces may also be used.

The memory 564 stores information within the mobile computing device550. The memory 564 can be implemented as one or more of acomputer-readable medium or media, a volatile memory unit or units, or anon-volatile memory unit or units. An expansion memory 574 may also beprovided and connected to the mobile computing device 550 through anexpansion interface 572, which may include, for example, a Single inLine Memory Module (SIMM) card interface. The expansion memory 574 mayprovide extra storage space for the mobile computing device 550, or mayalso store applications or other information for the mobile computingdevice 550. Specifically, the expansion memory 574 may includeinstructions to carry out or supplement the processes described above,and may also include secure information. Thus, for example, theexpansion memory 574 may be provided as a security module for the mobilecomputing device 550, and may be programmed with instructions thatpermit secure use of the mobile computing device 550. In addition,secure applications may be provided via the SIMM cards, along withadditional information, such as placing identifying information on theSIMM card in a non-hackable manner.

The memory may include, for example, flash memory and/or non-volatilerandom access memory (NVRAM), as discussed below. In someimplementations, instructions are stored in an information carrier. Theinstructions, when executed by one or more processing devices, such asprocessor 552, perform one or more methods, such as those describedabove. The instructions can also be stored by one or more storagedevices, such as one or more computer-readable or machine-readablemediums, such as the memory 564, the expansion memory 574, or memory onthe processor 552. In some implementations, the instructions can bereceived in a propagated signal, such as, over the transceiver 568 orthe external interface 562.

The mobile computing device 550 may communicate wirelessly through thecommunication interface 566, which may include digital signal processingcircuitry where necessary. The communication interface 566 may providefor communications under various modes or protocols, such as GlobalSystem for Mobile communications (GSM) voice calls, Short MessageService (SMS), Enhanced Messaging Service (EMS), Multimedia MessagingService (MMS) messaging, code division multiple access (CDMA), timedivision multiple access (TDMA), Personal Digital Cellular (PDC),Wideband Code Division Multiple Access (WCDMA), CDMA2000, General PacketRadio Service (GPRS). Such communication may occur, for example, throughthe transceiver 568 using a radio frequency. In addition, short-rangecommunication, such as using a Bluetooth or Wi-Fi, may occur. Inaddition, a Global Positioning System (GPS) receiver module 570 mayprovide additional navigation- and location-related wireless data to themobile computing device 550, which may be used as appropriate byapplications running on the mobile computing device 550.

The mobile computing device 550 may also communicate audibly using anaudio codec 560, which may receive spoken information from a user andconvert it to usable digital information. The audio codec 560 maylikewise generate audible sound for a user, such as through a speaker,e.g., in a handset of the mobile computing device 550. Such sound mayinclude sound from voice telephone calls, may include recorded sound(e.g., voice messages, music files, etc.) and may also include soundgenerated by applications operating on the mobile computing device 550.The mobile computing device 550 may be implemented in a number ofdifferent forms, as shown in FIG. 5. For example, it may be implementedas a mobile computing devices 104 and/or 106 of FIG. 1 and device 202(not shown) of FIG. 2. It may also be implemented as part of asmart-phone, personal digital assistant, or other similar mobile device.

Various implementations of the systems and techniques described here canbe realized in digital electronic circuitry, integrated circuitry,specially designed application specific integrated circuits (ASICs),computer hardware, firmware, software, and/or combinations thereof.These various implementations can include implementation in one or morecomputer programs that are executable and/or interpretable on aprogrammable system including at least one programmable processor, whichmay be for a special or general purpose, coupled to receive data andinstructions from, and to transmit data and instructions to, a storagesystem, at least one input device, and at least one output device.

These computer programs (also known as programs, software, softwareapplications or code) include machine instructions for a programmableprocessor, and can be implemented in a high-level procedural,object-oriented, assembly, and/or machine language. As used herein, theterms machine-readable medium and computer-readable medium refer to anycomputer program product, apparatus and/or device (e.g., magnetic discs,optical disks, memory, Programmable Logic Devices (PLDs)) used toprovide machine instructions and/or data to a programmable processor,including a machine-readable medium that receives machine instructionsas a machine-readable signal. The term machine-readable signal refers toany signal used to provide machine instructions and/or data to aprogrammable processor.

To provide for interaction with a user, the systems and techniquesdescribed here can be implemented on a computer having a display device(e.g., a cathode ray tube (CRT) or liquid crystal display (LCD) monitor)for displaying information to the user and a keyboard and a pointingdevice (e.g., a mouse or a trackball) by which the user can provideinput to the computer. Other kinds of devices can be used to provide forinteraction with a user as well; for example, feedback provided to theuser can be any form of sensory feedback (e.g., visual feedback,auditory feedback, or tactile feedback); and input from the user can bereceived in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in acomputing system that includes a back end component (e.g., as a dataserver), or that includes a middleware component (e.g., an applicationserver), or that includes a front end component (e.g., a client computerhaving a GUI or a web browser through which a user can interact with animplementation of the systems and techniques described here), or anycombination of such back end, middleware, or front end components. Thecomponents of the system can be interconnected by any form or medium ofdigital data communication, such as network 110 of FIG. 1. Examples ofcommunication networks include a LAN, a WAN, and the Internet.

The computing system can include clients and servers. A client andserver are generally remote from each other and typically interactthrough a communication network. The relationship of client and serverarises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other.

Although a few implementations have been described in detail above,other modifications are possible. For example, while a clientapplication is described as accessing the delegate(s), in otherimplementations the delegate(s) may be employed by other applicationsimplemented by one or more processors, such as an application executingon one or more servers. In addition, the logic flows depicted in thefigures do not require the particular order shown, or sequential order,to achieve desirable results. In addition, other actions may beprovided, or actions may be eliminated, from the described flows, andother components may be added to, or removed from, the describedsystems. Accordingly, other implementations are within the scope of thefollowing claims.

What is claimed is:
 1. A computer-implemented method for providing adigital image of a physical document, the method comprising: receiving afirst image of the physical document having a first glare signature anda second image of the physical document having a second glare signaturethat is different from the first glare signature; determining a firstglare map of the first image and a second glare map of the second image;comparing the first glare map to the second glare map; and generatingthe digital image based on the comparison of the first and second glaremaps.
 2. The computer-implemented method of claim 1, wherein the firstimage is taken with a flash, and wherein the second image is takenwithout a flash.
 3. The computer-implemented method of claim 1, whereinthe first image and the second image are taken in succession within athreshold temporal distance.
 4. The computer-implemented method of claim1, compromising: before determining the first glare map of the firstimage and the second glare map of the second image, aligning the firstimage and the second image based on the physical document.
 5. Thecomputer-implemented method of claim 4, wherein aligning the first imageand the second image includes: estimating a homography using featuresfrom accelerated segment test (FAST) detector and an oriented FAST androtated Binary Robust Independent Elementary Features (ORB) detector toprovide a description of texture around the physical document asdepicted in each image; and warping each pixel in the second image withthe first image through a bi-linear interpolation.
 6. Thecomputer-implemented method of claim 5, wherein the homography isestimated based on a random sample consensus (RANSAC) algorithm.
 7. Thecomputer-implemented method of claim 1, wherein determining the firstglare map of the first image includes generating a first greyscale imageof the first image, and determining a second glare map of the secondimage includes generating a second greyscale image of the second image.8. The computer-implemented method of claim 1, wherein the first glaremap and the second glare map are each binary images where each pixelrepresents either glare or no glare.
 9. The computer-implemented methodof claim 1, further comprising: before comparing the first glare map tothe second glare map, dilating the first glare map and the second glaremap to expanded regions of glare represent on each map.
 10. Thecomputer-implemented method of claim 9, wherein the digital image isgenerated by replacing the regions of glare in the first image withrespective mapped regions from the second image, wherein the mappedregions from the second image do not include glare.
 11. Thecomputer-implemented method of claim 10, wherein the mapped regions fromthe second image are merged into the first image to form the digitalimage through Poisson imaging blending.
 12. The computer-implementedmethod of claim 11, wherein gradient information throughout the replacedregions of glare is employed to interpolate a color propagated from aboundary of each replaced glare region in the generated digital image.13. The computer-implemented method of claim 10, wherein the mappedregions from the second image are merged into the first image to formthe digital image through a Mean Value Coordinates (MVC) for InstantImage Cloning algorithm.
 14. The computer-implemented method of claim 1,wherein the digital image is employed in an analysis of the physicaldocument to identify text or data elements in the physical document. 15.The computer-implemented method of claim 14, wherein the analysis of thephysical document includes at least one of optical character recognition(OCR), optical word recognition (OWR), intelligent character recognition(ICR), intelligent word recognition (IWR), natural language processing(NLP), or machine learning.
 16. The computer-implemented method of claim1, wherein the digital image is employed in an analysis of the physicaldocument to detect digital tampering or physical tampering.
 17. Thecomputer-implemented method claim 1, wherein the physical document is aprofessional or government-issued credentials or certifications.
 18. Oneor more non-transitory computer-readable storage media coupled to one ormore processors and having instructions stored thereon which, whenexecuted by the one or more processors, cause the one or more processorsto perform operations comprising: receiving a first image of a physicaldocument having a first glare signature and a second image of thephysical document having a second glare signature that is different fromthe first glare signature; determining a first glare map of the firstimage and a second glare map of the second image; comparing the firstglare map to the second glare map; and generating a digital image basedon the comparison of the first and second glare maps.
 19. The one ormore non-transitory computer-readable storage media of claim 18, whereinthe digital image is employed in an analysis of the physical document toidentify text or data elements in the physical document or to detectdigital tampering or physical tampering, wherein the analysis of thephysical document includes at least one of optical character recognition(OCR), optical word recognition (OWR), intelligent character recognition(ICR), intelligent word recognition (IWR), natural language processing(NLP), or machine learning, and wherein the physical document is aprofessional or government-issued credentials or certifications.
 20. Asystem, comprising: one or more processors; and a computer-readablestorage device coupled to the one or more processors and havinginstructions stored thereon which, when executed by the one or moreprocessors, cause the one or more processors to perform operationscomprising: receiving a first image of a physical document having afirst glare signature and a second image of the physical document havinga second glare signature that is different from the first glaresignature; aligning the first image and the second image based on thephysical document by: estimating a homography using features fromaccelerated segment test (FAST) detector and an oriented FAST androtated Binary Robust Independent Elementary Features (ORB) detector toprovide a description of texture around the physical document asdepicted in each image; and warping each pixel in the second image withthe first image through a bi-linear interpolation; determining a firstglare map of the first image by generating a first greyscale image ofthe first image; determining a second glare map of the second image bygenerating a second greyscale image of the second image; dilating thefirst glare map and the second glare map to expanded regions of glarerepresent on each map; comparing the first glare map to the second glaremap; and generating a digital image by replacing the regions of glare inthe first image with respective mapped regions from the second image,wherein the mapped regions from the second image do not include glare.